Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Please refer to Spark Issue: Task Not Serializable for a similar serialization issue in Spark/Scala.
Symptom
Cause
For example, if you have the following import
from nltk.corpus import stopwords
then calling the following in UDF or pandas UDFs might cause this issue.
stopwords.words("english")
Solution
Simply move stopwords.words("english")
out of UDFs and/or pandas UDFs to define a global variable.
References
关于python:Spark-Submit出现“ Pickling错误”“ _pickle.PicklingError:newobj args中的args [0]具有错误的类”
_pickle.PicklingError: args[0] from newobj args has the wrong class from cloudpickle.py